167 research outputs found

    Discovering Latent Clusters from Geotagged Beach Images

    Full text link
    Abstract. This paper studies the problem of estimating geographical locations of images. To build reliable geographical estimators, an impor-tant question is to find distinguishable geographical clusters in the world. Those clusters cover general geographical regions and are not limited to landmarks. The geographical clusters provide more training samples and hence lead to better recognition accuracy. Previous approaches build geographical clusters using heuristics or arbitrary map grids, and can-not guarantee the effectiveness of the geographical clusters. This paper develops a new framework for geographical cluster estimation, and em-ploys latent variables to estimate the geographical clusters. To solve this problem, this paper employs the recent progress in object detection, and builds an efficient solver to find the latent clusters. The results on beach datasets validate the success of our method.

    FPM: Fine Pose Parts-Based Model with 3D CAD Models

    Get PDF
    We introduce a novel approach to the problem of localizing objects in an image and estimating their fine-pose. Given exact CAD models, and a few real training images with aligned models, we propose to leverage the geometric information from CAD models and appearance information from real images to learn a model that can accurately estimate fine pose in real images. Specifically, we propose FPM, a fine pose parts-based model, that combines geometric information in the form of shared 3D parts in deformable part based models, and appearance information in the form of objectness to achieve both fast and accurate fine pose estimation. Our method significantly outperforms current state-of-the-art algorithms in both accuracy and speed

    Китайські джерела щодо центральноазійського виміру політики КНР у галузі регіональній безпеки в постбіполярний період

    Get PDF
    We propose a method for detecting dyadic interactions: fine-grained, coordinated interactions between two people. Our model is capable of recognizing interactions such as a hand shake or a high five, and locating them in time and space. At the core of our method is a pictorial structures model that additionally takes into account the fine-grained movements around the joints of interest during the interaction. Compared to a bag-of-words approach, our method not only allows us to detect the specific type of actions more accurately, but it also provides the specific location of the interaction. The model is trained with both video data and body joint estimates obtained from Kinect. During testing, only video data is required. To demonstrate the efficacy of our approach, we introduce the ShakeFive dataset that consists of videos and Kinect data of hand shake and high five interactions. On this dataset, we obtain a mean average precision of 49.56%, outperforming a bag-of-words approach by 23.32%. We further demonstrate that the model can be learned from just a few interactions

    ClassCut for Unsupervised Class Segmentation

    Get PDF
    Abstract. We propose a novel method for unsupervised class segmentation on a set of images. It alternates between segmenting object instances and learning a class model. The method is based on a segmentation energy defined over all images at the same time, which can be optimized efficiently by techniques used before in interactive segmentation. Over iterations, our method progressively learns a class model by integrating observations over all images. In addition to appearance, this model captures the location and shape of the class with respect to an automatically determined coordinate frame common across images. This frame allows us to build stronger shape and location models, similar to those used in object class detection. Our method is inspired by interactive segmentation methods [1], but it is fully automatic and learns models characteristic for the object class rather than specific to one particular object/image. We experimentally demonstrate on the Caltech4, Caltech101, and Weizmann horses datasets that our method (a) transfers class knowledge across images and this improves results compared to segmenting every image independently; (b) outperforms Grabcut [1] for the task of unsupervised segmentation; (c) offers competitive performance compared to the state-of-the-art in unsupervised segmentation and in particular it outperforms the topic model [2].

    Graph Mining for Object Tracking in Videos

    No full text
    International audienceThis paper shows a concrete example of the use of graph mining for tracking objects in videos with moving cameras and without any contextual information on the objects to track. To make the mining algorithm efficient, we benefit from a video representation based on dy- namic (evolving through time) planar graphs. We then define a number of constraints to efficiently find our so-called spatio-temporal graph pat- terns. Those patterns are linked through an occurrences graph to allow us to tackle occlusion or graph features instability problems in the video. Experiments on synthetic and real videos show that our method is effec- tive and allows us to find relevant patterns for our tracking application

    Character-level interaction in multimodal computer-assisted transcription of text images

    Full text link
    “The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-21257-4_85To date, automatic handwriting text recognition systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems. As an alternative, an interactive framework that integrates the human knowledge into the transcription process has been presented in previous works. In this work, multimodal interaction at character-level is studied. Until now, multimodal interaction had been studied only at whole-word level. However, character-level pen-stroke interactions may lead to more ergonomic and friendly interfaces. Empirical tests show that this approach can save significant amounts of user effort with respect to both fully manual transcription and non-interactive post-editing correction.Work supported by the Spanish Government (MICINN and “Plan E”) under the MITTRAL (TIN2009-14633-C03-01) research project and under the research programme Consolider Ingenio 2010: MIPRCV (CSD2007-00018), and by the Generalitat Valenciana under grant Prometeo/2009/014.Martín-Albo Simón, D.; Romero Gómez, V.; Toselli ., AH.; Vidal, E. (2011). Character-level interaction in multimodal computer-assisted transcription of text images. En Pattern Recognition and Image Analysis. Springer Verlag (Germany). 684-691. https://doi.org/10.1007/978-3-642-21257-4S68469

    Approximation Algorithms for Connected Maximum Cut and Related Problems

    Full text link
    An instance of the Connected Maximum Cut problem consists of an undirected graph G = (V, E) and the goal is to find a subset of vertices S \subseteq V that maximizes the number of edges in the cut \delta(S) such that the induced graph G[S] is connected. We present the first non-trivial \Omega(1/log n) approximation algorithm for the connected maximum cut problem in general graphs using novel techniques. We then extend our algorithm to an edge weighted case and obtain a poly-logarithmic approximation algorithm. Interestingly, in stark contrast to the classical max-cut problem, we show that the connected maximum cut problem remains NP-hard even on unweighted, planar graphs. On the positive side, we obtain a polynomial time approximation scheme for the connected maximum cut problem on planar graphs and more generally on graphs with bounded genus.Comment: 17 pages, Conference version to appear in ESA 201

    Detection of visual defects in citrus fruits: multivariate image analysis vs graph image segmentation

    Full text link
    ¿The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-40261-6_28This paper presents an application of visual quality control in orange post-harvesting comparing two different approaches. These approaches correspond to two very different methodologies released in the area of Computer Vision. The first approach is based on Multivariate Image Analysis (MIA) and was originally developed for the detection of defects in random color textures. It uses Principal Component Analysis and the T2 statistic to map the defective areas. The second approach is based on Graph Image Segmentation (GIS). It is an efficient segmentation algorithm that uses a graph-based representation of the image and a predicate to measure the evidence of boundaries between adjacent regions. While the MIA approach performs novelty detection on defects using a trained model of sound color textures, the GIS approach is strictly an unsupervised method with no training required on sound or defective areas. Both methods are compared through experimental work performed on a ground truth of 120 samples of citrus coming from four different cultivars. Although the GIS approach is faster and achieves better results in defect detection, the MIA method provides less false detections and does not need to use the hypothesis that the bigger area in samples always correspond to the non-damaged areaLópez García, F.; Andreu García, G.; Valiente González, JM.; Atienza Vanacloig, VL. (2013). Detection of visual defects in citrus fruits: multivariate image analysis vs graph image segmentation. En Computer Analysis of Images and Patterns. Springer Verlag (Germany). 8047:237-244. doi:10.1007/978-3-642-40261-6S237244804

    Tree-based Coarsening and Partitioning of Complex Networks

    Full text link
    Many applications produce massive complex networks whose analysis would benefit from parallel processing. Parallel algorithms, in turn, often require a suitable network partition. For solving optimization tasks such as graph partitioning on large networks, multilevel methods are preferred in practice. Yet, complex networks pose challenges to established multilevel algorithms, in particular to their coarsening phase. One way to specify a (recursive) coarsening of a graph is to rate its edges and then contract the edges as prioritized by the rating. In this paper we (i) define weights for the edges of a network that express the edges' importance for connectivity, (ii) compute a minimum weight spanning tree TmT^m with respect to these weights, and (iii) rate the network edges based on the conductance values of TmT^m's fundamental cuts. To this end, we also (iv) develop the first optimal linear-time algorithm to compute the conductance values of \emph{all} fundamental cuts of a given spanning tree. We integrate the new edge rating into a leading multilevel graph partitioner and equip the latter with a new greedy postprocessing for optimizing the maximum communication volume (MCV). Experiments on bipartitioning frequently used benchmark networks show that the postprocessing already reduces MCV by 11.3%. Our new edge rating further reduces MCV by 10.3% compared to the previously best rating with the postprocessing in place for both ratings. In total, with a modest increase in running time, our new approach reduces the MCV of complex network partitions by 20.4%
    corecore